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jrt Abstract 

^^ In this short report, we discuss how coordinate-wise descent algorithms can be used to solve 

ryT) minimum variance portfolio (MVP) problems in which the portfolio weights are constrained by Iq 

OQ norms, where 1 < q < 2. A portfolio which weights are regularised by such norms is called a sparse 



a 



^_^ portfolio (Brodie et al. 20091, since these constraints facilitate sparsity (zero components) of the 

^^ weight vector. We first consider a case when the portfolio weights are regularised by a weighted 

^ li and squared I2 norm. Then two benchmark data sets (Fama and French 48 industries and 100 



size and BM ratio portfolios) are used to examine performances of the sparse portfolios. When the 

sample size is not relatively large to the number of assets, sparse portfolios tend to have lower out- 

I , of-sample portfolio variances, turnover rates, active assets, short-sale positions, but higher Sharpe 

I ' ratios than the unregularised MVP. We then show some possible extensions; particularly we derive 

an efficient algorithm for solving an MVP problem in which assets are allowed to be chosen grouply. 

> 

00 1 Introduction 

O 

The short report discusses how coordinate-wise descent algorithms can be used to solve minimum 

j^-^ variance portfolio (MVP) problems in which the portfoho weights are penalised by Iq norms, where 

^^ 1 < q < 2. A portfolio which weights are regularised by such norms is called a sparse portfolio, since 

. . these constraints facilitate sparsity (zero components) of the weight vector. We first consider a special 

.I_i case when the portfolio weights are regularised by a weighted li and squared I2 norm: 

X 

S minw'^Ew -f Aa ||w||; -K A (1 — a) ||w||; subject to w'^lp = 1, (1) 

where ||w||; — J2^=i l^di ll"^lli == Sf^i ^h -^ ^ "^^ ^i^*^ ^ ^ [0' !]• We call A the penalty parameter, 
and a is the parameter for adjusting the relative weight of h and squared I2 norms. Suppose we have 
p assets, then dim (w) = I x p. We also have 



/ o-f ai2 ■ ■ ■ aip \ 
o'2i <^2 ■ ■ ■ (^^p 



\ o-pi ap2 ■ ■ ■ cTp / 
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as the covariance matrix of p asset returns, and it is a p x p symmetric positive semidefinite matrix. 

The problem (nj) can be seen as an MVP problem with constraint a X^iLi I'^il + (1 ^ Q?) TTi=i "^1 — C) 
where c > is some constant. Such a weighted ^i and squared I2 norm is called the elastic net 
constraint (Zou and Hastie 20051 in regression-based variable selection problems. When a = 0, the 



weights are only regularised by squared I2 norm, and the optimal solution of (fTl) is the same as that 
of the unregularised MVP problem in which S is replaced by S + AIpxp- When a ~ I, the weights 
are only penalised by h norm. Performances and properties of such ^i norm constrained portfolio are 
documented in [Brodie et ah] ( |2009[ ), [DeMiguel et al.| ( |2009[ ), [Fan, Zhang, and Yu| ( |2009[ ) and [Welsch 
and Zhou (2007). When < a < 1, the solution of (II]) is equivalent to that of the MVP problem with 
covariance matrix S + A(l — a)Ipxp and h penalty Xa ||w||; . 



2 The Algorithm 

Let 7 be th 
is given by 



Let 7 be the Lagrange multiplier of the constraint w-^lp = 1. The Lagrangian function of problem (|1 



L(vir,7; E, A,q;) = w Sw + Aa||w||; + A (1 — a) ||w||; — 7(w Ip — 1) 

p 
= w Sw + y {Xa \wi\ + X {1 — a) wf — ■ywi) + 7. 



(2) 



At the stationary point, the following condition should hold. 



2wia^ + 2 2^ ^jO'y + 2A (1 — a) Wi — 7 — —Xa if Wi > 0, 

p 
2wiaf + 2 2^ ^j'^ij + 2A (1 — a) tUi — 7 = Xa if Wi < 0, 






W^l. 



< Xa if Wi = 0, 
= 1. 



(3) 



Friedman et al. (20071 show how coordinate- wise descent algorithms are powerful to solve regression- 



based variable selection problems with convex constraints. Suppose the objective function / (w) = 
/ {wi, . . . , Wp) is differentiable and convex. The method starts by fixing Wi, for i = 2, . . . ,p, and then 
find a value of wi to minimise (or reduce) / (w) . The next step is to fix Wi, for i — 1, 3, . . . ,p and then 
find a value of w^ to minimise (or reduce) the objective function and so on. When the iteration is done 
over all Wi , we then go back to start the iteration again for w\ . The procedure repeats until we find the 
global minimum of / (w) . 

Theoretical validity of using coordinate- wise descent algorithms to solve regularised penalty problems 



can be found in Tseng (2001). Suppose 

p 

/(w) = /o(w) + ^ /,(«;,). 



i=l 



It can be shown that if / (w) is bounded, /o (w) is differentiable and convex, and fi (wi) is also convex but 
not a function of Wj for any j ^ i,j ~ 1, ■ ■ ■ ,p (X^iLi fii'^i) is additively separable), then coordinate- 
wise descent algorithms are valid for solving minw / ('w). The property of additive separability can be 
applied to when /i(.) is a multivariate function, i.e. Wi is a vector. Then the property holds if Wi and 
Wj do not have any overlapped element. 

It is easily seen that given 7, S, A and a, i("w. 7; S, A, a) — 7 satisfies the above conditions if we 
set /o (w) = w"^Sw and fi{wi) ~ Xa\wi\ + X{1 — a)wf — 'ywi. Therefore with fixed 7, E, A and 
a, minimising ([2]) via a suitable coordinate-wise descent algorithm can attain its minimum. However, 
solved portfolio weights for such minimum may not satisfy the full investment constraint w'^lp = 1. 
Note that in an unregularised MVP optimisation, through the adjustment of 7, we can make the 
constraint w^lp — 1 hold. We can use the same trick here and finally achieve the optimal solution of 
0. 

Following the strategy, we propose the following form to update each Wi, 



STh-2J2P_,^wjaij,Xa] 

where 

ST {x, y) = sign {x) {\x\ - y)^ 

is the soft thresholding function. It can be shown that with fixed 7, S, A, a and Wj, j — 1,- ■ ■ ,p, j j^ i, 
Q is at the stationary point of ^. 

As mentioned, 7 can be used to make the constraint w"^lp — 1 hold. Our strategy for updating 7 is 
to take advantage of this property. Let Zi ^ 2 Y^j^i ^j'^ij- From ([3|, it can be shown that when Wi ^ 0, 

7 - -Zj - Aa 
2(af + A(l-a)) 

7 — Zi + Aa 
2(af + A(l-a)) 



Wi = ^ ^ 2 I wi TTT' ifwi>0, 



9 , N /I ^^ , if Wi < 0. 



Let S+ = {i : w^ > 0} and S^ ^ {i : w^ < 0}. Then 



p 

w,, 
=1 



w-1, = y: 

2 = 1 



Wi 



E 7 - z, - \a ^ 7 - Zj -f Aa 



^^^^ 2 (af + A (1 - a)) ^^_ 2 (a^ + A (1 - a)) 

"^l E 2((72 + A(l-a)) j " E 2(f72 + A(l-a)) 



^"l ^ 2(a2 + A(l-a)) ^ 



2(af + A(l-«)) 



By w-^lp = 1, we propose the following form to update 7 



-'- + ^ie5+US- 9/'^2 , wi_„^N ^°^ [z^iGS- 9{^2,xt,^\\ Z^ 






For stability of the algorithm, we set the initial values: wi = W2 = ■ ■ ■ = Wp — 1/p, and 7 > A. The 
updating process starts from wi, then 'W2, ■ ■ ■ , and Wp. The updated vector w is then used to update 
7. The process terminates until w and 7 converge. The algorithm can be summarised as follow^ 

Algorithm 1 Naive Coordinate- Wise Descent Updating for MVP Penalised by a Weighted 
li and squared I2 Norm 

1. Fix A and a e [0, 1] at some constant levels. 

2. Initialise w = -1„ and 7 = A x 1.1. 

P 

3. For i = 1,. . . ,p, 



w, ^ 



2(a2 + A(l-a)) 
4-. Let Zi — '2J2^i=ii'^j'^ij- Update 7 as 



l+E»eS+US- 2(^f + A(l-a)) -^" (^SieS- 2{af + X(l-a)) S,eS+ 2(^f+A(l-a)) 



ieS+US- 2(af+A(l-a)) 



5. Repeat 3 and 4 until w and 7 converge. 



Before the algorithm is used, one caution should be made here. The coordinate-wise descent algo- 
rithm is easily implemented. The convergence is very fast when A is not small, since the resulted w is 
sparse. It is still true even when p is very large. A fast and stable convergence is very important for the 
following empirical studies, since we rebalance the portfolio quite often over a long period. However, 
when A is too small, the resulted w will have only a few (or none) zero components; the convergence 
would become extremely slow. This perhaps is the main reason why the coordinate- wise type algorithms 
are often ignored in solving optimisation problem in which solution vectors are often dense. 

3 Empirical Results 

3.1 Profiles of the Optimal Weights 

We then use the algorithm to obtain the optimal sparse portfolios from real data. Figure [l] shows 
under different a and A, profiles of portfolio weights, proportion of active portfolios: |S'+ U 5^1 /p, and 
proportion of shortsale portfohos: \S^\ /p. We set a = 0, 0.5, 1, and A = 0, 0.5, . . . , 15. The data used is 
monthly return data of Fama and French 25 portfolios formed on size and book-to-market. The period 



^Sample codes for R programme are available from the author. 



is randomly selected 120 months (here is from November 1986 to October 1996). The S we calibrate 
into M is the sample covariance matrix of monthly returns during the selected period. 

Corresponding to each A, sum of the optimal weights is 1. When a = 1, we only have li penalty ac- 



tive, and the profiles behave very similar as those of regression coefficients with the LASSO (Tibshirani 



1996): some of the weights are exactly zero when A >> 0. However, unlike the LASSO profiles, the 
profiles of the sparse portfolio weights do not all vanish to zero when A goes large, since the constraint 
w-^lp — 1 needs to hold. When a = 0.5, the profiles behave like those of regression coefficients with 
the elastic net. When a = 0, it is equivalent to regularising the weights with squared I2 norm, and the 
profiles behave like those of regression coefficients with the ridge regression. 

For a ^ 0, the proportion of active portfolios declines as the penalty parameter A goes large. Active 
h penalty facilitates the sparsity, so 15*+ U S^\ /p = 1 when a — Q. If A is large enough and a = 1, it 
can be shown that the solution of (IT]) is the solution of the MVP problem with no-shortsale constraints. 
We have checked this property and find that when a = I and A > 5, the solution produced by the 
algorithm is almost the same as the optimal no-shortsale solution. Consequently, when a = 1 and A is 
large, all of the active portfolio weights tend to be positive. When a — 0.5 and A is large, no portfolio 
has any negative weight, but the number of active portfolios is different from the case of a = 1. The 
result is not surprising, since a — 0.5 is equivalent to replacing S with S + 0.5AIpxp in (ll|) but only 
having the li penalty 0.5A ||w||j active. Therefore as A goes large, the new problem will also have a 
new optimal no-shortsale solution. 

3.2 A Comparison with Other Optimisation Solver 

We then compare the solutions produced by Algorithms 1 with other optimisation package for solving 
problem (fTj). Figure p^ presents cumulative differences between the solutions from Algorithm 1 and cvx 
(Grant and Boyd]|2010[). Let 'Wcd,t and \Vcvx,t be the t period solution vectors produced by Algorithm 



1 and cvx respectively. The cumulative difference is defined as, 

T 

^ \\'^cd,t - V/cvx,t\\li ■ 

t=T+l 

The data used is the same as in section 3.3 (r — 120, and T — 483 and 555 for 48 industries and 100 
size and BM portfolios respectively, see below). We fix a = 1 and vary A at six different levels. As can 
be seen in Figure [2] the cumulative differences are small and decline with A (but not monotonically) . 
We also use cvx to obtain the optimal no-shortsale weights. Dash line (dot line) shows the cumulative 
difference between the no-shortsale solutions and Wcd,t i'^cvx,t) when A = 30. It is a little bit surprising 
that Wcd,t is even closer to the no-shortsale solutions than vi/cyxji is. 

3.3 Performances of the Sparse PortfoHos 

Next we look at how the sparse portfolios perform in real world. The data sets used are monthly return 
data of Fama and French 48 industry portfolios and 100 portfolios formed on size and book-to-market. 
The period we select for the 48 industry portfolios is from July of 1969 to September of 2009; for the 
100 size and book-to-market portfolios, it is from July of 1963 to September of 2009. For the case of 
48 industry portfolios, we do not find any missing data. However we find 89 missing data in the case of 



100 size and book-to-market portfolios. If one month has missing data, we use equally weighted returns 
of other available portfolios at that month to replace the missing data. 

We consider the cases of a = 1 and a = 0.5 with different A. As mentioned, when a = 1, similar 
results already documented in previous research. However, those results are from different algorithms 



and different strategies for updating the portfolio. For example, Brodie et al. (2009) and Fan, Zhang, 



and Yu (20091 use modified Least Angle Regression algorithm (LARS) (Efron et al. 2004). Thus one 
of our focus is on whether the algorithm we use produces different numerical results. We also compare 
performances of the sparse portfolios with two benchmark portfolios: a naively diversified portfolio with 
equal weights 1/p and the no-shortsale portfolio. We rebalance the portfolios every month (monthly 
updating the weights). Figure [S] to Figure [6] show under different A, the (out of) sample variance 
of portfolio returns, Sharpe ratio, turnover rate, proportion of active portfolios, absolute position of 
shortsale portfolios, and the optimal 7 from solving M. Corresponding to each a, we vary A at 15 
different levels, ranging from to 30. 

We now introduce the "rolling window" strategy used to update the portfolio and how we obtain 
these quantities. Let monthly return of asset i at period t be r^^t; E used to calibrate to solve (IT]) is the 
sample covariance matrix of the monthly returns from previous t = 120 months (from t — 120 io t—1). 
We solve (fTl) at the end of period i — 1 to obtain the optimal weights for period t. Let the optimal 
weight of asset i for period t be wi^f Then the (monthly) portfolio return at period t is 

p 

rpor,t = y^^m^tri^t- 

We use the 48 industry portfolios as an example. In the data set, we have totally T = 483 months, 
and t = 121, . . . ,483. We use (Var {rpor,t)) to denote the sample variance of rpor,t over the T — t + 
1 = 363 months. Var {rpor,t) is also called out of sample variance of the portfolio returns. Sharpe 
ratio [SR{rpor,t)) is the sample mean of rpor,t divided by its sample standard deviation over the 
T — t + 1 = 363 months. We calculate the other four quantities monthly and show their boxplots over 
the T - i + 1 = 363 months. 

We then introduce the turnover rate we use here. Suppose at the end of period t — 1, we have wealth 
0t-i to be invested on these assets. Suppose the weight needed to invest on asset i is Wi^t for period t. 
The value of holding asset i is then 9t-iWi,t (1 -I- ri^t) at the end of period t. The total wealth at period 
t is given by 

= y^6>t-iw,^t (1 + ri^t) 




If the weight of asset i for t + 1 is w^.t+i, then the amount of wealth to invest on asset i becomes 



OtWi^t+1- We define the turnover rate of asset i needed for period i + 1 as 



to. 



,t+i 



9tWi^t+i ~ Ot-iWi,t (1 + n^t) 



U^^t+1 - Wi,t 



(1 + rpor,t) 



That is, the proportion of wealth at the end of period t needed to be invested on asset i in order to 
satisfy the amount 6tWi^t+i- The portfolio turnover rate for period t + 1 is then defined as 



tOpor,t+l — / ^ tOi,t+l- 



Knowing tOpor,t+i is useful for an investor to evaluate whether a strategy for updating the portfolio is 
worth to implement or not if transaction costs are taken into account. For example, if buying or selling 
a stock needs to pay the fees about 0.15% of total value of the stock, and all of the assets considered 
are stocks, then the expected fees are topor.t+i x O.OO150f. 

The last two quantities are defined as the following. Let S^ = {i : Wit > 0} and Sf^ — {i : Wi_t < 0} . 
Proportion of active portfolios at period t is defined as: 



PACt 



s+ u S^ 



Absolute position of shortsale portfolios at period t is defined as sum of absolute values of the negative 
weights: 

The empirical results can be summarised as follows. 
1. When a = 1, the results of Var (jpor^t) and SR{jpor,t) corresponding to different A are pretty 



similar as those shown in Figure 2 of Brodie et al. (2009) and Figure 7 of Fan, Zhang, and 



Yu (2009). When A increases, the out of sample variance of the sparse portfolio declines to its 



minimum and then converges to the level when the no-shortsale portfolio is held. In the case of 
48 industry portfolios, the \/p portfolio has the largest Var (rpor.t), while in the case of 100 size 
and BM portfolios, the unregularised MVP (corresponding to A = 0) has the largest Var irpor^t)- 
This seems to suggest that the naive diversification is a better way to reduce portfolio variance 
than the unregularised MVP when p is relatively large to sample size. But the portfolio variance 
of the \/p portfolio are still larger than those which weights are regularised in the FF 100 size and 
BM case. For the case of a = 0.5, all the relevant results are very similar to the case of a = 1. 

2. When a = 1, Sharpe ratio increases monotonically with A in the case of 48 industry portfolios, 
and achieves its maximum level when the no-shortsale portfolio is held. But in the case of 100 
size and BM portfolios, it reaches its maximum value when A is around 1.7, and then starts to 
decline to the level when the no-shortsale portfolio is held. It is unknown why the same method 
produces two different patterns of Sharpe ratio paths for the two different data sets. It also worth 
to note that when a — 0.5, as for the 48 industry portfolios, if we set A > 5, we can obtain slightly 



better performances than those in the case of a = 1; but this benefit does not happen in the case 
of 100 size and BM portfolios. 

3. As for the issue of transaction costs, it can be seen that \/p portfolio has the lowest turnover rate. 
The result is widely documented in previous research. However, for the sparse portfolio, we find 
that the turnover rate also monotonically decreases as A increases. This suggests regularisation 
on portfolio weights also facilitates their stabilities over time, and is helpful for an investor to 
make decision when transaction costs are taken into account. 

4. When a = 1, PACt and APSt both decline monotonically with A. For PACt, the reason is that 
li penalty facilitates sparsity. For APSt, the reason is that as A increases, the optimal solution 
of (fTl) will converges to the solution of no-shortsale constrained problem, and consequently the 
absolute position of shortsale declines to zero. For the case of a = 0.5, APSt shows very similar 
behaviour as in the case of a = 1. However, PACt no longer declines monotonically with A. It 
reaches its minimum value and then slightly increases as A goes large. In general, we have more 
active assets in the case of a = 0.5 than a = 1. 

5. As a Lagrange multiplier, 7 seems not to be so interesting as the optimal portfolio weights. How- 
ever, we conjecture that 7 may play an important role on controlling how many assets should be 
included in the portfolio. To know how many assets should be included in a portfolio is helpful 
on allocating sources to monitor the individual asset performances. The coordinate-wise descent 
algorithm used here cannot tell us exactly how many assets are already included during the pro- 
cessing. One way we can do is to adjust A to roughly know this when the algorithm is processing. 
Some algorithms in regression-based variable selection have such advantage; for example. Least 
Angle Regression (LARS). For a regression function, LARS selects only one covariate each step, 
and after k steps, there will be exact k covariates in the regression. Therefore we can exactly con- 
trol the number of included variables by controlling the iterative steps of LARS. LARS algorithm 
does not need the penalty parameter A. The tricky part of LARS lies in its search direction and 



length of each iterated step. Efron et al. ( 2004 ) show the length of each iterated step of LARS can 
be optimally determined by its search direction in order to satisfy the "one step, one variable in" 
property. As can be seen in the figures, 7 monotonically increases with A. Note that A determines 
how many assets should be included in the portfolio. Therefore we wonder if we try to solve a 
sparse MVP problem in the way similar as LARS in regression-based variable selection, without 
A, 7 can provide the information of how many assets already have been included in the portfolio 
during the iterations. Future research is needed to confirm this conjecture. 

4 Some Extensions 

The sparse MVP problem is very similar to the regression-based variable selection problem. One of the 
differences between the two problems is that in the sparse MVP problem, sum of the portfolio weights 
is required to be 1. If we can guarantee that the constraint is satisfied, then many techniques used in 
variable selections can be applied to the problem. In this section we show some possible extensions. 



4.1 Mean- Variance Portfolio 

Until now we only consider the problem of purely minimising portfolio variance rather than the following 
mean-variance portfolio optimisation, 

min— Tw /i + w Sw + Aa||w||j + A (1 — a) ||w||; subject to w Ip = 1, (5) 

where r is the risk preference parameter, and /i ~ (/ii /ip) is the vector of expected asset returns. 
The problem (jsl) can be solved via a modified version of Algorithm 1 if we use the following update 
forms: 

5'r(7 - (r^j + Zj) , Aa) 
""^ ' 2iaf + X{l-a)) ' 



1 + Ejes+us- olJ^^^ltT^-r.)) ^^ T^tes- ',('^2 , ^^_„^^ E 



ieS+US- 2(<Tf+A(l-a)) ' \^ieS- 2{af + \{l-a)) ^ieS+ 2{a^+\{l-a)) 



E 



1 



ieS+US- 2(o-f + A(l-a)) 

Practically, to solve ([5]), we need to estimate fi, and it is not an easy task. One naive way to avoid such 
estimation is to specify 11 according to some prior beliefs; for example, if we we set ^ = 0, ([5]) becomes 
Q. [DeMiguel et al. (20091 and Jagannathan and Ma (2003) demonstrate unsatisfactory empirical 



evidences of portfolio selections via the traditional mean-variance portfolio optimisation. This is why 
we mainly consider (fl]) instead of ([5| in this report. 

4.2 Weighted h Penalty 

We can specify the penalty parameter according to our prior information about which assets are more 
important than the others. Considering a modified version of M: 

p 
minw Sw + A > 77i|wi| subject to w !„ = 1, (6) 



where rji is a nonnegative constant. The weighted Zi penalty is also used by adaptive LASSO in Zou 



( 2006 ) . The modified update forms are 

ST{j - z^,\r]i) 
2(a2 + A77,) ' 



Wi ^ 



1 + E,;es+us- 9^^2^i„ .^1 -^ Eies- 9('^2j!x„ \ E 



ieS+US- 2{af + \,u) y^ieS- 2(<Tf+Ar,i) ^ieS+ 2{a^+X7ji} 

EiGS+US- 2(a2+Ar,i) 



In Brodie et al. (2009), rji is viewed as the transaction cost for the i th asset. One also can view ?]i as 
an increasing function of risk of asset z, such as its af or beta. 



4.3 Berhu Penalty 

The coordinate-wise descent algorithm can be apphed to other penalty forms. For example, the berhu 



penalty proposed by Owen (2006): 



p 



X'^i^\w,\l{\w,\<S}+'''l±^l{\w,\>S}], 



(7) 



where 1{.} is the indicator function. The name "berhu" comes from that (l7| is the reverse of Huber's 
loss. The berhu penalty is convex and satisfies separability. It is also a comprise between li and squared 
I2 regularisation. If \wi\ is less than some criterion (5 > 0, it is regularised by h norm. If \wi\ is no less 
than (5, then it is regularised by squared I2 norm. 

With the berhu penalty and fixed 6, the first order derivative of the Lagrangian function is given by 



2wiaf + 2 2^ ^j'^ij — 7 + Xsign {wi) 1 {\wi\ < 5} + -^^1 {l^^il ^ 3} = 0, 

for i = 1, . . . ,p. It can be shown that at the stationary point, 

p 



2wi<7l + 2 2^ '^j^ij ~ 1 — ^^ if < Wi < (5, 

P 

2wi<7l + 2 N^ '^j'^ij ^ 1 — ^ if — (5 < Wi < 0, 

< A if w, = 0, 

- ii5<\w,\, 



p 



2wicrf + 2 ^ Wja.ij - 7 

P 

2wi(jl + 2 2^ WjfJij — 7 



Xwi 



W^lp = 1. 



Fixing Wj for j ~ 1, . . . ,p and i 7^ j, we can solve the above equation for Wi. When S < \wi\ , Wi 






7 — 2:.;— A 



By 2al + f > 0, (5 < jwi] implies that I7 - Zi| > 2crf5 + A. When < u;^ < (5, w^ = ^^^f^^ and this 
implies that 7 — 2;^ < 2al8^\. When —5 < Wi < 0, Wi = '^'^^t , and it implies that 7 — z^ > —2afS — X. 
We therefore propose the following form to update each Wi, 



Wi 



ST{j- Zi,X) . 



2a^ 

1 - Zl 



if |7-Zi| < 2crf(5 + A, 
if |7-z,| >2(j15 + \. 



Since the updating form of Wi is also linear in 7, we can derive updating form of 7 easily via w^lp = 1. 



10 



LctA- = {i: \j- z^\<2(jfS + X} &nd A+ = {i: I7 - z^] > 2cr25 + A} . Then 



p 

Wi 

i=i 

ieS+nA- ies-nA- jeA+ 



E 'f - Zi- X ,-^ 7 - z^ + A .^ ^- Zi 

9rr2 + Z^ 9^2+2^ 



2^7,2 ^ 2(7,2 Zl^ 2ct2+^ 



2+2^ 2cr2 I A 



Zi 



^' ^ 2a2+^, 2a? + A ,„, .^,„ . 2aJ 



^ie(S+nA-)u(S-nA-) ' ieA+ « ' a y je(S+nA-)u(S-nA 

With the constraint w^lp — 1, we propose following form to update 7, 

1 + ( z^ie(s+nA-)u(S-nA-) 2^ + z^jenA+ 2^^+^ j ^ '^ ( z^ies-nA- 2^ ^ z^ies+nA- 2^ 
7 i ^ y — — ^ 

z^ie(s+nA-)u(S-nA-) 2^ + z^ieA+ 2o-f+A 

The algorithm for the berhu penalty is similar as Algorithm 1, with a modification on updating Wi and 
7. The algorithm can be summarised as follows. 

Algorithm 2 Naive Coordinate- Wise Descent Updating for MVP Constrained by the Berhu 
Penalty 

1. Fix A and 5 at some constant levels. 

2. Initialise w = -1„ and 7 = A x 1.1. 

P 



3. For i = 1,. . . ,p, 



if\^-z,\<2a^6 + \,w,^^^^^ ""^^ 



2a. 
otherwise 

Wi -ir- 



2-1 + f 



4- Let Zi — 2 X]^=^i '^j'^ij- Update 7 as 

1 + \^Z^ie(s+nA-)u(s-nA-) 2^ + z.^ienA+ 2crf+| J ^ (^^ies-nA- 25f ~ Z^ieS+nA- 25f J 



7^ 



\^z.^ie(S+nA-)u(S-nA-) 2o-f + z.^ienA+ 20-2+^ 



5. Repeat 3 and 4 until w and 7 converge. 

Figure [7] shows when the berhu penalty is imposed, profiles of portfolio weights, proportion of active 
portfolios, and proportion of shortsale portfolios under different 6 and A. The data used is the same as 
in Figure [1] As (5 = 1, the profiles are almost the same as those in the case when only li constraint is 
active. However, if 6 deviates from 1, the profiles look very different from the previous cases. 
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4.4 Adaptive Group Portfolio 



Yuan and Lin (2006) propose the group LASSO which can select covariates "grouply" in a regression 
problem: either entire covariates in a certain group are all selected or all of them are dropped out. The 
penalty for variable selection in the group LASSO is the Euclidean norm ||.||; . However, the group 



LASSO can only result sparsity between groups; it cannot facilitate sparsity within a group. Friedman 



et al. (2010) propose the group sparse LASSO in which the penalty is a hybrid form which combines 
with the group penalty and li norm. They show that such setting can facilitate sparsity between groups 
as well as within a group. 

Catergorising assets into different groups based on certain characteristics of assets and then forming 
a portfolio according to these characteristics, is almost a standard process in portfolio management. 
This kind of strategic selecting assets of certain groups may be due to some practical reasons, e.g. 
financial regulations, investor's risk preferences or a fund's own objective. However, such strategy often 
concentrates on certain groups of assets and ignore the benefits from more diversities. Thus in term 
of minimising the overall portfolio variance, it is not optimal. In the following, we try to take the 
idea of group selection, but at the same time, also consider to minimise the overall variance. To fairly 
compromise the two goals, at first, we do not to pre-specify which groups of assets should be more 
important, nor do we ditch boundaries of groups and purely minimise the portfolio variance. Instead 
we group these assets according to their common features, and then minimising the portfolio variance 
with the group penalty. Thus the typical outcome is: either some assets (not "all" if we take sparsity 
within a group into account) in a certain group are selected or all of them arc dropped out. Indeed 
this may produce zero weights for one's preferred groups of assets. However, if we can specify different 
penalties to different groups of assets, such individual preference can be satisfied easily. Our method 
allows us to do such setting. 

Let w/ = (w/i, . . . , wik) be the portfolio weight vector for K assets in group I, I = 1, . . . ^ L. Without 
loss of generality, for each group, we assume the number of assets all equal to K. The case of different 
number of assets in different groups can be easily modified in our algorithm. However, we do not 
allow different groups can have the same assets. If different groups can have the same assets, the block 
coordinate descent method may not guarantee a stable convergence. Let w = (wi, . . . , wl), then the 
modified MVP problem is 

L 

minw-^Sw + Al yj ||wi||;^^ j^^ + A2 ||w||(^ subject to w^lp — 1, (8) 

1=1 



where ||w;||j j^ — i/w^ri;w;, and fi; is a kernel matrix, which is required to be symmetric and positive 
definite. By the definition, the Euclidean norm of W; can then be expressed as 

Suppose we have p assets, then p ~ L x K and dim (w) = I x p. The penalty parameters, Ai and A2 
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are still required to be nonnegative. For convenience, we re-express S as 



E = 



/ All Ai2 ■■■ Ail \ 
A21 A22 ■ ■ ■ A2L 



V Ali a 



L2 



All J 



For I, I' — 1, ... ,L, a I — I' , All' is the K x K covariance matrix for asset returns in group /. HI ^ I' , 
All '\s a. K X K matrix for covariances of asset returns between group I and /'. For example, \i K = h 
and I ^ 1, 



Aii = 



where s = 5/ — 4. 

Now we show that a block coordinate- wise descent algorithm can be used to solve the MVP problem 
when portfolio weights are penalised by the group penalty. Here we consider a special case of ([8]) in 
which A2 = and Q,i — An. We call the solved optimal solution "adaptive group portfolio". Let 7 be 
the Lagrange multiplier of constraint w"^lp. With such setting, the update form for w; and 7 can be 
solved explicitly. Under this restriction, (l8| becomes 



/ -? 


0'12 • 


■ (715 \ 




( ^l:S 


fl,s+l • 


• 0'l,s-|-4 \ 


0-21 


4 ■ 


• 0-25 


and All — 


CT2.S 


f2,s+l • 


0'2,s+4 


V C^51 


"■52 • 


• ^5^ 1 




\ CTs^s 


f5,s+l • 


■ 0'5,s+4 / 



minw'^Sw + Ai y^ ||w;| 
We can reparameterise w/ as 



l-2-Au 



subject to w'^lp = 1. 



(9) 



i=\ 



W; = A, ^X;, 



where x; is also a -fC x 1 vector. Therefore ||w/||; ^ 
becomes 



rinx^S'x-l-Ai V 



\M 



,^ = ||x;||,^. Let x = (xi,...,xl), then problem ([9]) 

L 

subject to ^xf^^5lK = l, (10) 



;=i 



where 



/ 1 A^}Ai2A^i ■■■ A J Ail A J \ 



S' = 



A2/A21A1/ 1 

V ^lI^liAJ aiial2aJ 



■■■ A^iA2LA^l 
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At the stationary point, for / = 1, . . . , i, 



2x( + 2A,, ^ Y. ^y V^:'- + ^1 ^7^ - ^Ai '^K = 0, If X, ^ 0, 



.J>^' 



^'11(2 



2A/ ' E ^y V^J + ^1^' - ^Al '^K = 0, If X; - 0, (11) 



J^l 



J2^TA;,hK = 1. 



1=1 



where si'is a, Kxl vector, and ||s;||, < 1. Let Bi — 2J2 Aij^n^^j ^-'^d A; (7) = 
The necessary and sufficient condition for x; = is 

A/(7)< Ai. 



Ai'l^K-A^^-^Bi 



From (111, if X; ^0, 



x; = 



AiHl^K-Bi 



As x/ 7^ impHes w/ 7^ 0, so 



i A,7i(7lK-S0 

l|w,IL,.^„ 



We now show ||w;||j ^ can be solved as a function of 7. To see this, we know that 



(12) 



Ai 



l^'ll;2,A„ 



A>i ^ Ar ii^K - Bi) , 



and then 



wr^ih 



Thus we can obtain 



4Ai 



l^'lli2,Aa l|W/|U,4 



^^ 1 A^wi = (7IK - 50^ AT' (7li^ - i?0 • 



(2||w/IL,,^„+Ai) =Af(7). 
By ||w;||( ^ > and Ai > 0, if A; — Ai > 0, the solution for ||w;||( ^ is given by 

A, (7) - Ai 



Therefore if w/ 7^ 0, 



l^'llb.A,, = 



w.4(l-^)A-(7l.-i?, 



Combining with the group- level test condition (12), we propose the following form to update the group 
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weight vector w/, 



""^K'-AT^l -^'''W"--*'- 



For updating 7, we can use the constraint yv^lp = 1. However, since the weights are non-linear in 7, we 
cannot update 7 as in the previous cases. Let S'' — {I : wi ^ 0}. Practically, updated 7 can be solved 
by minimising 




1- 



Ai 



^lil) 



An' (7IK - Bi 



with respect to 7. 

For stability of our algorithm, we set the initial values: wi — W2 = ■ ■ ■ = Wp = 1/p, and 7 > 
Ai^maxi=i^...p af . The updating process starts from wi, then "W2, . . . , then w^. The updated vector 
w is then used to update 7. The process terminates until w and 7 converge. The algorithm can be 
summarised as follows. 

Algorithm 3 Block Coordinate- Wise Descent Updating for MVP Penalised by Weighted 
Euclidean Norm 

1. Fix Ai at some non-negative constant level. 
1 



2. Initialise w = -Ipxi 0,'^d 7 — Ai -v/maxi^i „ af x 1.1. 



3. Let 2 X; AijWj = Bi. For I = 1, . . . ,L, 

3+1 



W; 



1 



Ai 



^i(7);+ 



at' (7IX - Bi 



4- Update 7 by 



7 •<— arg mm 

7 



E 



Ai 



^u' 



a;^' {-fiK 



Bi) 



5. Repeat 3 and 4 until w and 7 converge. 

We use Fama and French 100 size and BM ratio portfolios as an example. The data set contains 
value weighted returns for the intersections of 10 market cap portfolios and 10 BM ratio portfolios. We 
categorise these 100 portfolios via two different ways. The first method is to group them according to 
10 market cap levels; and in each group, we have 10 different BM ratio portfolios. The second method 
is opposite: we group them according to 10 BM ratio levels; and in each group, we have 10 different 
market cap portfolios. Thus the two settings both have L = K = 10. The results are shown in Figure 



mngand 11 We let "Size-BM" and "BM-Size" denote the first and second method respectively. All 



quantities are calculated via the same ways as in section 3.2. Sample size for estimating the sample 
covariance matrices is also 120. 

Different methods for grouping result in similar reductions on out-of-sample portfolio variances, but 
significant differences for Sharp ratios. As can be seen in Figure [ll] the two grouping methods also 
have different dynamics of PACt- Comparing with Figure [5] and Figure [6] when the penalty parameter 
becomes large, the Sharp ratios for Size-BM are slightly better than the cases of no grouping. 
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Figure 1: The figure shows under different a and A, profiles of portfoUo weights, proportion of active 
portfolios, and proportion of shortsale portfolios. We vary A = 0, 0.5, . . . , 15. The data used is from 
November 1986 to October 1996 of Fama and French 25 portfolios formed on size and book-to-market 
ratio. 
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Figure 2: The figure presents cumulative differences between tlie solutions from Algorithm 1 and cvx 
(Grant and Boyd 2010) for solving problem (fTl). We vary A at 6 different levels. The data used is 
monthly return data of Fama and French 48 industries and 100 size and book-to-market portfolios. The 
period we select for the 48 industry portfolios is from July of 1969 to September of 2009; for the 100 
size and book-to-market portfolios, it is from July of 1963 to September of 2009. 
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Figure 3: The figure shows under different A and trading strategies, the (out of) sample variance 
of portfolio returns, Sharpe ratio, turnover rate, proportion of active portfolios, absolute position of 
shortsale portfolios, and the optimal 7 from solving (fTl). We vary A at 15 different levels, ranging from 
to 30. The data used is monthly return data of Fama and French 48 industry portfolios. The period we 
select is from July of 1969 to September of 2009. In the box plots, ns and 1/p denote the no-shortsale 
and equally weighted portfolios respectively. 
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Figure 4: The figure shows under different A and trading strategies, the (out of) sample variance 
of portfolio returns, Sharpe ratio, turnover rate, proportion of active portfolios, absolute position of 
shortsale portfolios, and the optimal 7 from solving (fTl). We vary A at 15 different levels, ranging from 
to 30. The data used is monthly return data of Fama and French 48 industry portfolios. The period we 
select is from July of 1969 to September of 2009. In the box plots, ns and 1/p denote the no-shortsale 
and equally weighted portfolios respectively. 



20 



Alpha=1 



Alpha=1 



L1+L2 
Nnnnegative 









0.0 -2.0 50 ^J^ ioo 









-r ^ 


1 

1 
1 




1 
1 

"* 


--- L1+L2 

— Monnegfltive 

— 1/p 

1 



0.0 -2.0 5Q 70 10O 



lambda 



lambda 



Alpha=1 




Alpha=1 




oj: 0.7 1 u: 1.7 



10 i; 1= io ns i.p 



lambdi and tradin 9 strategies 



oi 0.7 1 ^2 



10 12 15 50 ns l.p 



^mt>da and tradin g strateg les 



Alpha=1 



Alpha=1 




02 0.7 1 ij: 



5 T 10 1; 15 :0 fis 1^ 



lambda and trading strategies 



01 05 0,7 1 M 15 IT 



lambda 



5 7 10 1; 15 30 



Figure 5; The figure shows under different A and trading strategies, the (out of) sample variance 
of portfolio returns, Sharpe ratio, turnover rate, proportion of active portfolios, absolute position of 
shortsale portfolios, and the optimal 7 from solving (fTl). We vary A at 15 different levels, ranging from 
to 30. The data used is monthly return data of Fama and French 100 size and book-to-market portfolios. 
The period we select is from July of 1963 to September of 2009. In the box plots, ns and 1/p denote the 
no-shortsale and equally weighted portfolios respectively. In this case, if one month has missing data, 
we use equally weighted returns of other available portfolios at that month to replace the missing data. 
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Figure 6; The figure shows under different A and trading strategies, the (out of) sample variance 
of portfolio returns, Sharpe ratio, turnover rate, proportion of active portfolios, absolute position of 
shortsale portfolios, and the optimal 7 from solving (fTl). We vary A at 15 different levels, ranging from 
to 30. The data used is monthly return data of Fama and French 100 size and book-to-market portfolios. 
The period we select is from July of 1963 to September of 2009. In the box plots, ns and 1/p denote the 
no-shortsale and equally weighted portfolios respectively. In this case, if one month has missing data, 
we use equally weighted returns of other available portfolios at that month to replace the missing data. 
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Figure 7: The figure shows profiles of portfoUo weights, proportion of active portfolios, and proportion 
of shortsale portfolios under different i5 and A when the berhu penalty is imposed. We vary A = 
0, 0.5, . . . , 15. The data used here is the same as in Figure [T] 
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Figure 8: The figure shows under different A, the (out of) sample variance of portfolio returns and 
Sharpe ratio for adaptive group portfolios. We vary A at 15 different levels, ranging from to 30. The 
data used is monthly return data of Fama and French 100 size and book-to-market portfolios. The 
period we select is from July of 1963 to September of 2009. We categorise these 100 portfolios via two 
different ways as described in section 4.4. In this case, if one month has missing data, we use equally 
weighted returns of other available portfolios at that month to replace the missing data. 
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Figure 9: The figure shows under different A and trading strategies, turnover rates and proportion of 
active portfolios of adaptive group, no-shortsale and 1/p portfoUos. We vary A at 15 different levels, 
ranging from to 30. The data used is monthly return data of Fama and French 100 size and book-to- 
market portfolios. The period we select is from July of 1963 to September of 2009. We categorise these 
100 portfolios via two different ways as described in section 4.4. In the box plots, ns and 1/p denote the 
no-shortsale and equally weighted portfolios respectively. In this case, if one month has missing data, 
we use equally weighted returns of other available portfolios at that month to replace the missing data. 
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Figure 10: The figure shows under different A and trading strategies, absolute positions of shortsale of 
adaptive group, no-shortsale and 1/p portfohos; and the optimal 7 from solving ^. We vary A at 15 
different levels, ranging from to 30. The data used is monthly return data of Fama and French 100 
size and book-to-market portfolios. The period we select is from July of 1963 to September of 2009. We 
categorise these 100 portfolios via two different ways as described in section 4.4. In the box plots, ns 
and 1/p denote the no-shortsale and equally weighted portfolios respectively. In this case, if one month 
has missing data, we use equally weighted returns of other available portfolios at that month to replace 
the missing data. 
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Figure 11: The figure shows monthly dynamics of PACt of two different adaptive portfolios Size-BM 
and BM-Size with different Ai, from July of 1973 to September of 2009. The dot-dash line (blue), solid 
line (black), dash Unc (red) and dot hne (green) in each graph correspond to Ai — 0.2, 1.2, 5 and 30. 
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